Make URL case insensitive. #1350

roygold7 · 2018-10-04T08:41:57Z

Now URL's starting with hTTp or Https will be captured by the regular expression.

styfle · 2018-10-04T15:22:50Z

@roygold7 Thanks for the PR! Can you add a unit test for an uppercase url?

@UziTech Do you think this will affect the email replacement?

@davisjam I don't think this will introduce ReDOS but please review just in case.

lib/marked.js

davisjam · 2018-10-04T16:51:43Z

Case-insensitivity does not affect this regex's behavior w.r.t. ReDoS. Though that's not universally true, for some regexes it would matter.

UziTech · 2018-10-04T18:22:54Z

Should other regexps also be case-insensitive? I'm thinking of the block.html, block._tag

Maybe a better question is, are there any regexes that need to be case-sensitive? or should we find a way to make all regexes case-insensitive by default?

styfle · 2018-10-04T19:41:20Z

lib/marked.js

@@ -606,7 +606,7 @@ inline.pedantic = merge({}, inline.normal, {
 inline.gfm = merge({}, inline.normal, {
  escape: edit(inline.escape).replace('])', '~|])').getRegex(),
  _extended_email: /[A-Za-z0-9._+-]+(@)[a-zA-Z0-9-_]+(?:\.[a-zA-Z0-9-_]*[a-zA-Z0-9])+(?![-_])/,
-  url: /^((?:ftp|https?):\/\/|www\.)(?:[a-zA-Z0-9\-]+\.?)+[^\s<]*|^email/,
+  url: /^((?:ftp|https?):\/\/|www\.)(?:[a-zA-Z0-9\-]+\.?)+[^\s<]*|^email/i,


Now that this is case-insensitive, could you also change a-zA-Z to a-z?

styfle · 2018-10-04T19:56:19Z

@UziTech I think it already is invoked as case-insensitive

marked/lib/marked.js

Line 66 in 971304a

block.html = edit(block.html, 'i')

UziTech · 2018-10-04T20:49:10Z

looks like this could be solved easier by just changing line 618 from

inline.gfm.url = edit(inline.gfm.url)

to

inline.gfm.url = edit(inline.gfm.url, 'i')

Martii · 2018-10-04T21:13:45Z

Careful on this PR... See https://url.spec.whatwg.org/#url-writing and the keyword "not" floating around in there for certain schemes.

UziTech · 2018-10-04T21:23:54Z

we are only matching ftp and https? schemes which seem to be fine as case-insensitive

Martii · 2018-10-04T21:50:16Z

Here's at least one where I am thinking it should be double checked and possibly added to the tests:

a URL-scheme string that is not an ASCII case-insensitive match for a special scheme, followed by U+003A (:) and a relative-URL string

Removing the double negative (emphasized font in immediate previous quote) ... it seems like this use case may need case sensitivity:

a URL-scheme string that is an ASCII case-sensitive match for a special scheme, followed by U+003A (:) and a relative-URL string

In this browser the Address bar does take a mix of upper case absolute URLs but then copying it out it always lower cases it... so that's how this browser handles it at least.

UziTech · 2018-10-05T03:09:22Z

The spec says it "must be one of the following" and ftp/http(s) matches

a URL-scheme string that is an ASCII case-insensitive match for a special scheme and not an ASCII case-insensitive match for "file", followed by U+003A (:) and a scheme-relative-special-URL string

UziTech · 2018-10-05T03:18:50Z

also looks like github doesn't mind if it is not lowercase.

hTtP://example.com

Martii · 2018-10-05T04:47:03Z

also looks like github doesn't mind if it is not lowercase.

Doesn't mean that GFM is following output guidelines from the W3 for well written HTML code conformance.

The decisions are these:

Fix a nuisance and change the incomplete specification to a potentially larger breaking change (this PR and a move to 0.6.0 of marked with all use case scenarios). This will trickle up the chain to all sanitizers and top-level code (which is why I'm commenting here). Quite frankly in my circles writing hTTp, or whatever variant, is poor coding and communication practice but the W3 says it's okay in certain circumstances. *shrugs*. Wonder how node handles this with plucking individual pieces out in URL API?
Check the input to see if there is a relative url and conditionalize the insensitive vs. sensitive operation to spec... potentially smaller breaking (only a use case scenario) but additive towards the compliance of the spec instead of subtraction. e.g. standard I/O... check the Input for potentially bad input and output the correct syntax specification. The text can always include the multi-cased http but the url itself should be normalized in this scenario.
Keep it as is ... still incomplete specification however unbreaking atm.

We'll accommodate regardless if this goes through but filtering will be affected with items 1 and 2. This is probably why item 3 exists is it is a balance perhaps. Hence why I said be careful and haven't bothered to vote prematurely either way.

UziTech · 2018-10-05T13:40:06Z

Doesn't mean that GFM is following output guidelines from the W3 for well written HTML code conformance.

This only affects inline url when gfm is turned on (which means they want it to act like GitHub).

Martii · 2018-10-05T19:32:05Z

act like GitHub

One place this could be addressable is in github/cmark however there's no real unit test, that I know of, since the W3 spec mentions bases and relative urls which is the use case exception. I'm not against this PR just noting what effects will happen with it. Filtering is where we will need to adjust to this forcing it to lowercase most likely (for security integrity). Still need to test this (when I get back to dev station next week) in node with the URL API and see if it breaks it natively with "parting it out"... it may not affect it at all or could affect it adversely. Quite honestly I've never seen anyone do a mix of upper and lower case schemes in another language including JavaScript so it caught my primary attention. GFM will most likely be normalized if it isn't. Should probably test in Commonmark too but again don't know if they have a base url floating around to test against from my understanding. :)

styfle · 2018-10-07T21:43:12Z

This works just fine:

const str = 'hTtP://example.com';
const url = new URL('hTtP://example.com');
console.log('node\t', process.version);
console.log('input\t', str);
console.log('url\t', url.href);

https://glot.io/snippets/f5if7pn2lf

Martii · 2018-10-07T23:48:14Z

A little more on the test range:

$ node -v && node -e 'var x = new URL("/doc/this.html", "hTtP://example.com"); console.log(x.protocol)'
v10.11.0
http:

... this passes (pre PR and without this dep)... and by the spec it seems like it shouldn't but that's probably a node issue to be raised.

Haven't had a chance to test the legacy URL API yet and probably older versions of node that are still active... little later perhaps if I have time this evening.

Expect this one to pass since it uses an absolute url and legacy API (again without the PR and this dep):

$ node -v && node -e 'var url = require("url"); var x = url.parse("hTtP://example.com/doc/this.html"); console.log(x.protocol)'
v10.11.0
http:

So basically a GH itself i/o page for GH parsing would be helpful when base is specified... would be helpful for a final test... still contemplating if a local projects i/o page could handle this.

Will test our sanitizer shortly to see if it's handling it the same way as node... our code is using case insensitive tests so we don't have to change that part... just checking sanitizer to be ultra-safe since there could be catastrophic results with across-the-pond projects and up/down-stream.

Pass with filtering on our sanitizer (post PR and with this dep)... albeit this isn't all sanitizers. Doesn't care if it's http or HtTp still strips the href if not allowed at this time.

Point still being more tests are still recommended to be added in case it complies and in case it doesn't. e.g. be careful. :)

Apologies for the reedits... attempting to make this more clear for the masses and to keep the noise level down.

styfle · 2018-12-06T17:46:19Z

Closing in favor of #1384

closes markedjs#1350

Make URL case insensitive.

8297da3

Now URL's starting with hTTp or Https will be captured by the regular expression.

roygold7 closed this Oct 4, 2018

roygold7 reopened this Oct 4, 2018

roygold7 added 2 commits October 4, 2018 15:02

Add case insensitive flag to rules.url

8c855f3

Fixed node legacy version

3725f9e

styfle reviewed Oct 4, 2018

View reviewed changes

lib/marked.js Outdated Show resolved Hide resolved

Changed "i" to 'i' for linter

408294e

unit test for case insensitivity

971304a

styfle reviewed Oct 4, 2018

View reviewed changes

UziTech mentioned this pull request Dec 5, 2018

Make autolinks case insensitive #1384

Merged

4 tasks

styfle closed this Dec 6, 2018

zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this pull request Nov 8, 2021

make links case insensitive

7fd544d

closes markedjs#1350

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make URL case insensitive. #1350

Make URL case insensitive. #1350

roygold7 commented Oct 4, 2018

styfle commented Oct 4, 2018

davisjam commented Oct 4, 2018 •

edited

Loading

UziTech commented Oct 4, 2018 •

edited

Loading

styfle Oct 4, 2018

styfle commented Oct 4, 2018 •

edited

Loading

UziTech commented Oct 4, 2018 •

edited

Loading

Martii commented Oct 4, 2018

UziTech commented Oct 4, 2018

Martii commented Oct 4, 2018

UziTech commented Oct 5, 2018

UziTech commented Oct 5, 2018

Martii commented Oct 5, 2018 •

edited

Loading

UziTech commented Oct 5, 2018

Martii commented Oct 5, 2018 •

edited

Loading

styfle commented Oct 7, 2018

Martii commented Oct 7, 2018 •

edited

Loading

styfle commented Dec 6, 2018

Make URL case insensitive. #1350

Make URL case insensitive. #1350

Conversation

roygold7 commented Oct 4, 2018

styfle commented Oct 4, 2018

davisjam commented Oct 4, 2018 • edited Loading

UziTech commented Oct 4, 2018 • edited Loading

styfle Oct 4, 2018

Choose a reason for hiding this comment

styfle commented Oct 4, 2018 • edited Loading

UziTech commented Oct 4, 2018 • edited Loading

Martii commented Oct 4, 2018

UziTech commented Oct 4, 2018

Martii commented Oct 4, 2018

UziTech commented Oct 5, 2018

UziTech commented Oct 5, 2018

Martii commented Oct 5, 2018 • edited Loading

UziTech commented Oct 5, 2018

Martii commented Oct 5, 2018 • edited Loading

styfle commented Oct 7, 2018

Martii commented Oct 7, 2018 • edited Loading

styfle commented Dec 6, 2018

davisjam commented Oct 4, 2018 •

edited

Loading

UziTech commented Oct 4, 2018 •

edited

Loading

styfle commented Oct 4, 2018 •

edited

Loading

UziTech commented Oct 4, 2018 •

edited

Loading

Martii commented Oct 5, 2018 •

edited

Loading

Martii commented Oct 5, 2018 •

edited

Loading

Martii commented Oct 7, 2018 •

edited

Loading